Extractive Spoken Document Summarization with Representation Learning Techniques

نویسندگان

  • Kai-Wun Shih
  • Kuan-Yu Chen
  • Shih-Hung Liu
  • Hsin-Min Wang
  • Berlin Chen
چکیده

The rapidly increasing availability of multimedia associated with spoken documents on the Internet has prompted automatic spoken document summarization to be an important research subject. Thus far, the majority of existing work has focused on extractive spoken document summarization, which selects salient sentences from an original spoken document according to a target summarization ratio and concatenates them to form a summary concisely, in order to convey the most important theme of the document. On the other hand, there has been a surge of interest in developing representation learning techniques for a wide variety of natural language processing (NLP)-related tasks. However, to our knowledge, they are largely unexplored in the context of extractive spoken document summarization. With the above background, this study explores a novel use of both word and sentence representation techniques for extractive spoken document summarization. In addition, three variants of sentence ranking models building on top of such representation techniques are proposed. Furthermore, extra information cues like the prosodic features extracted from spoken documents, apart from the lexical features, are also employed for boosting the summarization performance. A series of experiments conducted on the MATBN broadcast news corpus indeed reveal the performance merits of our proposed summarization methods in relation to several state-of-the-art baselines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extractive spoken document summarization for information retrieval

The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In the paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling...

متن کامل

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

Extractive Chinese Spoken Document Summarization Using Probabilistic Ranking Models

The purpose of extractive summarization is to automatically select indicative sentences, passages, or paragraphs from an original document according to a certain target summarization ratio, and then sequence them to form a concise summary. In this paper, in contrast to conventional approaches, our objective is to deal with the extractive summarization problem under a probabilistic modeling fram...

متن کامل

Leveraging word embeddings for spoken document summarization

Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly fa...

متن کامل

Positional language modeling for extractive broadcast news speech summarization

Extractive summarization, with the intention of automatically selecting a set of representative sentences from a text (or spoken) document so as to concisely express the most important theme of the document, has been an active area of experimentation and development. A recent trend of research is to employ the language modeling (LM) approach for important sentence selection, which has proven to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2015